Fix replica unable trigger migration when it received CLUSTER SETSLOT in advance #981

enjoy-binbin · 2024-09-02T15:39:57Z

When the primary and replica cluster-allow-replica-migration configuration items are
different, the replica migration does not meet expectations.

Originally: primary not allow, replica allow. In this case, the code will migrate the
replica away, and the primary will keep as an empty primary. However, in #970, we found
a timing issue. For example, if the replica receives CLUSTER SETSLOT first and then receives
the gossip change, it will not trigger the replica-migration.

We perform replica migration explicitly in CLUSTER SETSLOT in this case.

Closes #970.

enjoy-binbin · 2024-09-02T15:43:45Z

logs (to see the full logs in https://github.com/valkey-io/valkey/actions/runs/10658482333/job/29539601550):

replica
"""
 |####+'     .+#######+.     '+####|    Port: 27173
 |###|   .+###############+.   |###|    PID: 30240             
30240:M 02 Sep 2024 00:15:27.283 * No cluster configuration found, I'm 69c0773b7994df742f8a37862b06f32bebfeaee1


### Starting test valkey-cli make source node ignores NOREPLICAS error when doing the last CLUSTER SETSLOT in tests/unit/cluster/replica-migration.tcl
30240:S 02 Sep 2024 00:15:39.789 - Accepted 127.0.0.1:37220
30240:S 02 Sep 2024 00:15:39.792 * Migrating slot 0 to node 6e5a9dab27e4c4b56b63ad890d41bed40e7c0b7c ()
30240:S 02 Sep 2024 00:15:39.794 * Assigning slot 0 to node 6e5a9dab27e4c4b56b63ad890d41bed40e7c0b7c () in shard fe55243c7e785a1e139292a70793f52b5d83da85
30240:S 02 Sep 2024 00:15:39.806 - Client closed connection id=8 addr=127.0.0.1:37220 laddr=127.0.0.1:27173 fd=28 name= age=0 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=20474 argv-mem=0 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=37888 events=r cmd=cluster|nodes user=default redir=-1 resp=3 lib-name= lib-ver= tot-net-in=56 tot-net-out=1982 tot-cmds=2
30240:S 02 Sep 2024 00:16:31.690 - Client closed connection id=6 addr=127.0.0.1:27177 laddr=127.0.0.1:39568 fd=21 name= age=61 idle=2 flags=M db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=40954 argv-mem=0 multi-mem=0 rbs=1024 rbp=36 obl=0 oll=0 omem=0 tot-mem=43008 events=r cmd=ping user=(superuser) redir=-1 resp=3 lib-name= lib-ver= tot-net-in=374 tot-net-out=2283 tot-cmds=11
30240:S 02 Sep 2024 00:16:31.690 * Connection with primary lost.
30240:S 02 Sep 2024 00:16:31.690 * Caching the disconnected primary state.
30240:S 02 Sep 2024 00:16:31.690 * Reconnecting to PRIMARY 127.0.0.1:27177
30240:S 02 Sep 2024 00:16:31.690 * PRIMARY <-> REPLICA sync started
30240:S 02 Sep 2024 00:16:31.690 # Error condition on socket for SYNC: Connection refused
30240:S 02 Sep 2024 00:16:31.787 * Connecting to PRIMARY 127.0.0.1:27177
30240:S 02 Sep 2024 00:16:31.787 * PRIMARY <-> REPLICA sync started
30240:S 02 Sep 2024 00:16:31.787 # Error condition on socket for SYNC: Connection refused
30240:S 02 Sep 2024 00:16:31.989 * NODE 6e5a9dab27e4c4b56b63ad890d41bed40e7c0b7c () possibly failing.
30240:S 02 Sep 2024 00:16:32.293 * NODE ee32166e30d115bb4c0ab65b192919235b722eec () possibly failing.
30240:S 02 Sep 2024 00:16:32.293 # Cluster state changed: fail
"""


primary
"""

    .+########+'     '+########+.                                    
 .########+'     .+.     '+########.    Running in cluster mode
 |####+'     .+#######+.     '+####|    Port: 27177
 |###|   .+###############+.   |###|    PID: 30370                     
30370:M 02 Sep 2024 00:15:28.690 * No cluster configuration found, I'm 53cd7df0480f96d7b0146bbffc3b599675d00bc0

### Starting test valkey-cli make source node ignores NOREPLICAS error when doing the last CLUSTER SETSLOT in tests/unit/cluster/replica-migration.tcl
30370:M 02 Sep 2024 00:15:39.788 - Accepted 127.0.0.1:55280
30370:M 02 Sep 2024 00:15:39.792 * Migrating slot 0 to node 6e5a9dab27e4c4b56b63ad890d41bed40e7c0b7c ()
30370:M 02 Sep 2024 00:15:39.794 * Slot 0 is migrated from node 53cd7df0480f96d7b0146bbffc3b599675d00bc0 () in shard 5342c5f1e5dacf2e24c0ab2a8300c51713b9f90c to node 6e5a9dab27e4c4b56b63ad890d41bed40e7c0b7c () in shard fe55243c7e785a1e139292a70793f52b5d83da85.
30370:M 02 Sep 2024 00:15:39.794 * Slot 0 is no longer being migrated to node 6e5a9dab27e4c4b56b63ad890d41bed40e7c0b7c () in shard fe55243c7e785a1e139292a70793f52b5d83da85.
30370:M 02 Sep 2024 00:15:39.794 * My last slot was migrated to node 6e5a9dab27e4c4b56b63ad890d41bed40e7c0b7c () in shard fe55243c7e785a1e139292a70793f52b5d83da85. I am now an empty primary.
30370:M 02 Sep 2024 00:15:39.801 * Assigning slot 0 to node 6e5a9dab27e4c4b56b63ad890d41bed40e7c0b7c () in shard fe55243c7e785a1e139292a70793f52b5d83da85
"""

codecov · 2024-09-03T03:48:55Z

Codecov Report

Attention: Patch coverage is 88.88889% with 1 line in your changes missing coverage. Please review.

Project coverage is 70.46%. Comparing base (fa348e2) to head (f9abd82).
Report is 8 commits behind head on unstable.

Files with missing lines	Patch %	Lines
src/cluster_legacy.c	85.71%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable     #981      +/-   ##
============================================
- Coverage     70.57%   70.46%   -0.12%     
============================================
  Files           114      114              
  Lines         61660    61664       +4     
============================================
- Hits          43515    43449      -66     
- Misses        18145    18215      +70

Files with missing lines	Coverage Δ
src/networking.c	`88.41% <100.00%> (+0.15%)`	⬆️
src/replication.c	`87.08% <100.00%> (+0.12%)`	⬆️
src/server.h	`100.00% <ø> (ø)`
src/cluster_legacy.c	`86.39% <85.71%> (+0.23%)`	⬆️

... and 9 files with indirect coverage changes

tests/support/cluster_util.tcl

PingXie · 2024-09-03T06:23:46Z

@enjoy-binbin, your proposal looks intuitive and reasonable but I don't know if it matches the original intent of this flag.

@madolson you mentioned in today's meeting that, in a k8s deployment that lacks a control plane, the engine is expected to make node placement decisions. This makes sense on a high level but I still don't think I understand the purpose of this flag. This flag is effective only in a scale down scenario. So with or without this flag, we will end up with surplus k8s nodes in the end, which is likely the ultimate purpose of this scale down operation. The difference is that without this flag, these surplus pods will be hosting empty shards; while with this flag, we don't have empty shards but we will have more replicas. So if the goal is to scale down the k8s cluster, I would have thought that it is easier to identify empty shards/pods than trying to remove extra replicas? I must've missed some important context.

BTW, @enjoy-binbin, I believe some of the recent tests we worked on actually took dependency on this split migration policy (so we could control the failover behavior/timing). I wonder if they will be broken by this PR.

In any case, this will be a breaking change.

enjoy-binbin · 2024-09-03T06:47:32Z

your proposal looks intuitive and reasonable but I don't know if it matches the original intent of this flag.

yes, indeed, i'm a bit fuzzy about this too.

I believe some of the recent tests we worked on actually took dependency on this split migration policy (so we could control the failover behavior/timing). I wonder if they will be broken by this PR.

the tests seems to be ok, it passed in the CI.

I think under normal cases, the configurations is consistencies. And in our tests, we make it inconsistencies so here we are. There is a case, primary allow and the replica are not allow, what will happen if primary lose its last slot? In our code (both unstable and this branch), they both will become a replica, in this case, what does replica-migration not allow means in this replica side?

I think when redis/redis#5285 was introduced, maybe a better way is added a allow_primary_migration and a allow_replica_migration

If we worry too much, we can skip it. It's also good that the failed test exposed the problem so we can think about it later. (It should have been there before i think)

Originally: primary not allow, replica allow. In this case, the code will migrate the
replica away, and the primary will keep as an empty primary. However, in #970, we found
a timing issue. For example, if the replica receives CLUSTER SETSLOT first and then receives
the gossip change, it will not trigger the replica-migration.

madolson · 2024-09-03T22:50:24Z

@madolson you mentioned in today's meeting that, in a k8s deployment that lacks a control plane, the engine is expected to make node placement decisions. This makes sense on a high level but I still don't think I understand the purpose of this flag. This flag is effective only in a scale down scenario. So with or without this flag, we will end up with surplus k8s nodes in the end, which is likely the ultimate purpose of this scale down operation. The difference is that without this flag, these surplus pods will be hosting empty shards; while with this flag, we don't have empty shards but we will have more replicas. So if the goal is to scale down the k8s cluster, I would have thought that it is easier to identify empty shards/pods than trying to remove extra replicas? I must've missed some important context.

I was talking about replica barrier in conjunction with replica migration. The idea is that you say have 7 total pods, and then 3 primaries each with a replica. You set the replica barrier to 1. If a node fails, that shard will be down to one node, so the extra replica will migrate over and fill in the missing nodes as the other pod may eventually come online. FWIW, I know very few people that do this.

However, some folks like to have strict shards, and they don't want replicas to ever migrate between nodes. This flag was originally introduced to handle the edge case where someone wanted to migrate data away from the shard then delete the pods. They didn't want the replicas to follow.

So the tl;dr is that replica migration has larger implications, but this flag was introduced to solve a specific problem.

Better handling of replica-migration when primary and replica have different cluster-allow-replica-migration

My honest impression of this configuration is that I don't care if they're different. Like, you can do a lot of things that are inconsistent. I would rather warn users about inconsistent configs then try to handle it perfectly.

enjoy-binbin · 2024-09-11T07:03:25Z

Discuss this with @PingXie offline, we will remove it from 8.0 and will consider it in the future. We will ignore this case in the failed test so that we can get a green CI.

… in advance When the primary and replica cluster-allow-replica-migration configuration items are different, the replica migration does not meet expectations. Originally: primary not allow, replica allow. In this case, the code will migrate the replica away, and the primary will keep as an empty primary. However, in valkey-io#970, we found a timing issue. For example, if the replica receives CLUSTER SETSLOT first and then receives the gossip change, it will not trigger the replica-migration. We perform replica migration explicitly in CLUSTER SETSLOT in this case. Signed-off-by: Binbin <binloveplay1314@qq.com>

enjoy-binbin · 2024-09-11T15:05:03Z

I backup the old code to a different branch: https://github.com/enjoy-binbin/valkey/tree/better_replica_migration_bak

@PingXie I guess we can get this change into 8.0? I test it in local and it work.

Signed-off-by: Binbin <binloveplay1314@qq.com>

…migration Signed-off-by: Binbin <binloveplay1314@qq.com>

PingXie

LGTM.

The hostname test seems to have failed quite consistently?

src/cluster_legacy.c

Signed-off-by: Binbin <binloveplay1314@qq.com>

enjoy-binbin · 2024-09-12T03:15:41Z

the hostname test got improved in #1016, i have rebase it, hope it can solve it.

src/cluster_legacy.c

Co-authored-by: Ping Xie <pingxie@outlook.com> Signed-off-by: Binbin <binloveplay1314@qq.com>

src/cluster_legacy.c

Signed-off-by: Binbin <binloveplay1314@qq.com>

src/cluster_legacy.c

Signed-off-by: Binbin <binloveplay1314@qq.com>

src/cluster_legacy.c

Signed-off-by: Binbin <binloveplay1314@qq.com>

enjoy-binbin · 2024-09-13T03:09:11Z

@madolson @zuiderkwast do you guys wanna take a look with this one too? in case i mess up something.

src/cluster_legacy.c

src/networking.c

src/replication.c

Co-authored-by: Ping Xie <pingxie@outlook.com> Signed-off-by: Binbin <binloveplay1314@qq.com>

Signed-off-by: Binbin <binloveplay1314@qq.com>

PingXie · 2024-09-13T06:54:33Z

sorry my empty shard tests are still failing. let me disable them again to unblock others.

… in advance (valkey-io#981) Fix timing issue in evaluating `cluster-allow-replica-migration` for replicas There is a timing bug where the primary and replica have different `cluster-allow-replica-migration` settings. In issue valkey-io#970, we found that if the replica receives `CLUSTER SETSLOT` before the gossip update, it remains in the original shard. This happens because we only process the `cluster-allow-replica-migration` flag for primaries during `CLUSTER SETSLOT`. This commit fixes the issue by also evaluating this flag for replicas in the `CLUSTER SETSLOT` path, ensuring correct replica migration behavior. Closes valkey-io#970 --------- Signed-off-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Ping Xie <pingxie@outlook.com>

PingXie · 2024-09-13T22:41:40Z

We have seen more and more failures of #970 in both unstable and 8.0 so I went ahead and merged this PR. We can continue the discussion asynchronously.

… in advance (valkey-io#981) Fix timing issue in evaluating `cluster-allow-replica-migration` for replicas There is a timing bug where the primary and replica have different `cluster-allow-replica-migration` settings. In issue valkey-io#970, we found that if the replica receives `CLUSTER SETSLOT` before the gossip update, it remains in the original shard. This happens because we only process the `cluster-allow-replica-migration` flag for primaries during `CLUSTER SETSLOT`. This commit fixes the issue by also evaluating this flag for replicas in the `CLUSTER SETSLOT` path, ensuring correct replica migration behavior. Closes valkey-io#970 --------- Signed-off-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Ping Xie <pingxie@outlook.com> Signed-off-by: Ping Xie <pingxie@google.com>

… in advance (valkey-io#981) Fix timing issue in evaluating `cluster-allow-replica-migration` for replicas There is a timing bug where the primary and replica have different `cluster-allow-replica-migration` settings. In issue valkey-io#970, we found that if the replica receives `CLUSTER SETSLOT` before the gossip update, it remains in the original shard. This happens because we only process the `cluster-allow-replica-migration` flag for primaries during `CLUSTER SETSLOT`. This commit fixes the issue by also evaluating this flag for replicas in the `CLUSTER SETSLOT` path, ensuring correct replica migration behavior. Closes valkey-io#970 --------- Signed-off-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Ping Xie <pingxie@outlook.com>

… in advance (valkey-io#981) Fix timing issue in evaluating `cluster-allow-replica-migration` for replicas There is a timing bug where the primary and replica have different `cluster-allow-replica-migration` settings. In issue valkey-io#970, we found that if the replica receives `CLUSTER SETSLOT` before the gossip update, it remains in the original shard. This happens because we only process the `cluster-allow-replica-migration` flag for primaries during `CLUSTER SETSLOT`. This commit fixes the issue by also evaluating this flag for replicas in the `CLUSTER SETSLOT` path, ensuring correct replica migration behavior. Closes valkey-io#970 --------- Signed-off-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Ping Xie <pingxie@outlook.com> Signed-off-by: Ping Xie <pingxie@google.com>

… in advance (#981) Fix timing issue in evaluating `cluster-allow-replica-migration` for replicas There is a timing bug where the primary and replica have different `cluster-allow-replica-migration` settings. In issue #970, we found that if the replica receives `CLUSTER SETSLOT` before the gossip update, it remains in the original shard. This happens because we only process the `cluster-allow-replica-migration` flag for primaries during `CLUSTER SETSLOT`. This commit fixes the issue by also evaluating this flag for replicas in the `CLUSTER SETSLOT` path, ensuring correct replica migration behavior. Closes #970 --------- Signed-off-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Ping Xie <pingxie@outlook.com> Signed-off-by: Ping Xie <pingxie@google.com>

… in advance (valkey-io#981) Fix timing issue in evaluating `cluster-allow-replica-migration` for replicas There is a timing bug where the primary and replica have different `cluster-allow-replica-migration` settings. In issue valkey-io#970, we found that if the replica receives `CLUSTER SETSLOT` before the gossip update, it remains in the original shard. This happens because we only process the `cluster-allow-replica-migration` flag for primaries during `CLUSTER SETSLOT`. This commit fixes the issue by also evaluating this flag for replicas in the `CLUSTER SETSLOT` path, ensuring correct replica migration behavior. Closes valkey-io#970 --------- Signed-off-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Ping Xie <pingxie@outlook.com> Signed-off-by: naglera <anagler123@gmail.com>

enjoy-binbin requested a review from PingXie September 2, 2024 15:39

enjoy-binbin added the run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP) label Sep 2, 2024

enjoy-binbin commented Sep 3, 2024

View reviewed changes

tests/support/cluster_util.tcl Outdated Show resolved Hide resolved

enjoy-binbin force-pushed the better_replica_migration branch from 87db9dd to 3aef09e Compare September 11, 2024 15:02

enjoy-binbin changed the title ~~Better handling of replica-migration when primary and replica have different cluster-allow-replica-migration~~ Fix replica unable trigger migration when it received CLUSTER SETSLOT in advance Sep 11, 2024

fix format

cef953a

Signed-off-by: Binbin <binloveplay1314@qq.com>

enjoy-binbin added the release-notes This issue should get a line item in the release notes label Sep 11, 2024

Merge remote-tracking branch 'upstream/unstable' into better_replica_…

01fb912

…migration Signed-off-by: Binbin <binloveplay1314@qq.com>

PingXie reviewed Sep 12, 2024

View reviewed changes

src/cluster_legacy.c Show resolved Hide resolved

code review from Ping

5603e98

Signed-off-by: Binbin <binloveplay1314@qq.com>

PingXie approved these changes Sep 12, 2024

View reviewed changes

PingXie reviewed Sep 12, 2024

View reviewed changes

src/cluster_legacy.c Show resolved Hide resolved

Update src/cluster_legacy.c

81f27c8

Co-authored-by: Ping Xie <pingxie@outlook.com> Signed-off-by: Binbin <binloveplay1314@qq.com>

enjoy-binbin commented Sep 12, 2024

View reviewed changes

src/cluster_legacy.c Outdated Show resolved Hide resolved

Update src/cluster_legacy.c

3df8cbe

Signed-off-by: Binbin <binloveplay1314@qq.com>

enjoy-binbin commented Sep 12, 2024

View reviewed changes

src/cluster_legacy.c Outdated Show resolved Hide resolved

enjoy-binbin commented Sep 12, 2024

View reviewed changes

src/cluster_legacy.c Outdated Show resolved Hide resolved

Update src/cluster_legacy.c

87da96b

Signed-off-by: Binbin <binloveplay1314@qq.com>

enjoy-binbin commented Sep 12, 2024

View reviewed changes

src/cluster_legacy.c Outdated Show resolved Hide resolved

enjoy-binbin added 2 commits September 12, 2024 13:44

Update src/cluster_legacy.c

3f69a1c

Signed-off-by: Binbin <binloveplay1314@qq.com>

Handle freeClientAsync did not respect full_sync_required

fc3f0c3

Signed-off-by: Binbin <binloveplay1314@qq.com>

enjoy-binbin requested a review from PingXie September 12, 2024 11:49

PingXie approved these changes Sep 13, 2024

View reviewed changes

src/cluster_legacy.c Outdated Show resolved Hide resolved

src/networking.c Outdated Show resolved Hide resolved

src/replication.c Outdated Show resolved Hide resolved

enjoy-binbin and others added 2 commits September 13, 2024 13:44

Apply suggestions from code review

8896c0e

Co-authored-by: Ping Xie <pingxie@outlook.com> Signed-off-by: Binbin <binloveplay1314@qq.com>

fix format

f9abd82

Signed-off-by: Binbin <binloveplay1314@qq.com>

PingXie mentioned this pull request Sep 13, 2024

Final batch of changes for 8.0 GA release #1030

Closed

PingXie merged commit dcc7678 into valkey-io:unstable Sep 13, 2024
56 checks passed

enjoy-binbin deleted the better_replica_migration branch September 14, 2024 00:18

enjoy-binbin mentioned this pull request Sep 14, 2024

8.0 GA Release #1031

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix replica unable trigger migration when it received CLUSTER SETSLOT in advance #981

Fix replica unable trigger migration when it received CLUSTER SETSLOT in advance #981

enjoy-binbin commented Sep 2, 2024 •

edited

Loading

enjoy-binbin commented Sep 2, 2024 •

edited

Loading

codecov bot commented Sep 3, 2024 •

edited

Loading

PingXie commented Sep 3, 2024

enjoy-binbin commented Sep 3, 2024

madolson commented Sep 3, 2024 •

edited

Loading

enjoy-binbin commented Sep 11, 2024

enjoy-binbin commented Sep 11, 2024

PingXie left a comment

enjoy-binbin commented Sep 12, 2024

enjoy-binbin commented Sep 13, 2024

PingXie commented Sep 13, 2024

PingXie commented Sep 13, 2024

Fix replica unable trigger migration when it received CLUSTER SETSLOT in advance #981

Fix replica unable trigger migration when it received CLUSTER SETSLOT in advance #981

Conversation

enjoy-binbin commented Sep 2, 2024 • edited Loading

enjoy-binbin commented Sep 2, 2024 • edited Loading

codecov bot commented Sep 3, 2024 • edited Loading

Codecov Report

PingXie commented Sep 3, 2024

enjoy-binbin commented Sep 3, 2024

madolson commented Sep 3, 2024 • edited Loading

enjoy-binbin commented Sep 11, 2024

enjoy-binbin commented Sep 11, 2024

PingXie left a comment

Choose a reason for hiding this comment

enjoy-binbin commented Sep 12, 2024

enjoy-binbin commented Sep 13, 2024

PingXie commented Sep 13, 2024

PingXie commented Sep 13, 2024

enjoy-binbin commented Sep 2, 2024 •

edited

Loading

enjoy-binbin commented Sep 2, 2024 •

edited

Loading

codecov bot commented Sep 3, 2024 •

edited

Loading

madolson commented Sep 3, 2024 •

edited

Loading